[Dist][Release] Use one wheel for different CUDA version by oraluben · Pull Request #1826 · tile-ai/tilelang

oraluben · 2026-02-09T09:47:29Z

Followup of #1821
Validate that wheels built with CUDA 12.8 works with CUDA 11.8 (torch 2.4) and CUDA 13 (torch 2.10)

After this PR, libcudart from host toolchain is not loaded.

for libnvrtc, both host lib and pip lib are loaded by torch. this behavior stays same from torch 2.4 to 2.10.

Summary by CodeRabbit

Chores
- Simplified CI by removing nightly CUDA 13.0 branches and related nightly-specific handling; unified Python/version mapping and build/test flows to use a single version variable.
Refactor
- Streamlined CUDA runtime and NVRTC probing to prefer globally available symbols, reduced initialization complexity, and fail fast with clearer error messages when required symbols are absent.
- Adjusted stub initialization to proceed to symbol validation rather than early-returning.
Documentation
- Added detailed runtime internals docs and motivation notes explaining the CUDA stub design, loading strategy, and build-time toggle.

github-actions · 2026-02-09T09:47:38Z

👋 Hi! Thank you for contributing to the TileLang project.

Please remember to run pre-commit run --all-files in the root directory of the project to ensure your changes are properly linted and formatted. This will help ensure your contribution passes the format check.

We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀

coderabbitai · 2026-02-09T09:47:59Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Removed nightly-specific CI branches and simplified CUDA/NVRTC/CUDART stub loading to probe global symbols first via dlsym(RTLD_DEFAULT) then dlsym(RTLD_NEXT), emitting an error and aborting if target symbols are absent.

Changes

Cohort / File(s)	Summary
CI/CD Workflow `\.github/workflows/dist.yml`	Removed Nightly-CUDA-13.0 entries and nightly-specific branches; dropped Python 3.9 test-version mapping; switched `setup-uv` to use `matrix.python-version`; removed UV_INDEX/nightly Torch-install logic.
CUDA Runtime & NVRTC stubs `src/target/stubs/cudart.cc`, `src/target/stubs/nvrtc.cc`	Replaced multi-SONAME/dlopen probing and RTLD_NOLOAD reuse with global-symbol probes using `dlsym(RTLD_DEFAULT)` then `dlsym(RTLD_NEXT)` (checks `cudaGetErrorString` / `nvrtcVersion`); added `_GNU_SOURCE`, `stdio.h`, `stdlib.h`; emit standardized error and `abort()` if symbols missing; removed path-based dlopen loops.
Docs & Header `docs/runtime_internals/stubs.md`, `src/target/stubs/cuda.h`	Added detailed documentation of stub mechanism, loading strategy, error semantics, and build-time toggle; expanded header motivation text for lazy-loading CUDA stubs.

Sequence Diagram(s)

sequenceDiagram
  participant App as "App / Stub"
  participant Linker as "Dynamic Linker\n(RTLD_DEFAULT / RTLD_NEXT)"
  participant Lib as "CUDA lib\n(if loaded)"

  App->>Linker: dlsym(RTLD_DEFAULT, "symbol")
  alt found in RTLD_DEFAULT
    Linker-->>App: symbol pointer
    App->>Lib: call via global symbol (no dlopen)
  else not found
    App->>Linker: dlsym(RTLD_NEXT, "symbol")
    alt found in RTLD_NEXT
      Linker-->>App: symbol pointer
      App->>Lib: call via next-symbol resolver
    else not found
      Linker-->>App: NULL
      App->>App: print error and abort()
    end
  end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

[Dist] Add lazy-loading stubs for CUDART + NVRTC (CUDA 11/12/13 compatible wheels) #1821: Modifies the same stub files (src/target/stubs/cudart.cc, src/target/stubs/nvrtc.cc) and implements an alternative loading strategy.
[Release] Fix race condition when publishing #1578: Changes CI workflow and CUDA/nightly build matrix entries, overlapping with .github/workflows/dist.yml edits.
[Enhancement] Allow import tilelang on CPU-only machines without CUDA libraries #1481: Touches CUDA stub/runtime-loading code and related headers, code-level related to stub resolution.

Suggested labels

enhancement, dependencies

Suggested reviewers

XuehaiPan
LeiWang1999

Poem

🐰 I hopped through symbols, keen and bright,
I sniffed DEFAULT first in the night.
If NEXT replied, I twitched my nose,
If none were found, I thumped my woes.
A tidy hop, and off I go — goodnight.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title '[Dist][Release] Use one wheel for different CUDA version' directly addresses the main objective: enabling a single wheel to work across multiple CUDA versions (11.8, 12.8, 13) by refactoring the stub loading mechanism.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

.github/workflows/dist.yml (1)

117-126: ⚠️ Potential issue | 🟡 Minor

Wheels only validated with basic importability — cross-CUDA runtime testing missing.

The dist.yml workflow builds wheels with CUDA 12.8 (lines 118–120) and only tests basic importability: import tilelang (line 203). If wheels built with CUDA 12.8 are claimed to work with CUDA 11.8 and CUDA 13, no CI workflow validates this. The ci.yml workflow also tests only CUDA 12.8, with no matrix entries for other CUDA versions.

🤖 Fix all issues with AI agents

In `@src/target/stubs/cudart.cc`:
- Around line 64-75: TryLoadLibCudart currently returns RTLD_DEFAULT which can
resolve to the stub's own exported cudaGetErrorString causing infinite recursion
and also fails to dlopen libcudart from disk; update TryLoadLibCudart to first
probe with dlsym(RTLD_NEXT, "cudaGetErrorString") (or check a sentinel boolean
that marks "in-stub" symbols to avoid self-resolution) and if that fails attempt
to dlopen common libcudart filenames (e.g., "libcudart.so", versioned names)
with RTLD_NOW|RTLD_LOCAL and return that handle; ensure CreateCUDARuntimeAPI
still uses the returned handle (not the stub) so function pointers point to real
libcudart symbols and not the stub to avoid recursion.

In `@src/target/stubs/nvrtc.cc`:
- Around line 40-50: Replace the fragile dlsym(RTLD_DEFAULT, "nvrtcVersion")
check in TryLoadLibNvrtc with a robust load/check sequence: avoid using
RTLD_DEFAULT (which can return the stub's own nvrtcVersion) and instead first
try dlopen with RTLD_NOLOAD on likely lib names to detect an already-loaded
libnvrtc, then if not found attempt dlopen of the library file (e.g.,
"libnvrtc.so" variants) to load it from disk; use the returned dlopen handle and
dlsym(handle, "nvrtcVersion") to confirm the symbol, and return that handle (or
nullptr on failure) — update TryLoadLibNvrtc and all callers to use the dlopen
handle rather than RTLD_DEFAULT to prevent self-resolution and allow standalone
loading.

🧹 Nitpick comments (1)

.github/workflows/dist.yml (1)

187-208: Wheel test only checks importability — no torch/CUDA runtime validation.

The test step installs the wheel and runs import tilelang, but doesn't install or test with any PyTorch version. Given the new stub design relies on PyTorch having loaded libcudart/libnvrtc into the global namespace, this test won't exercise the dynamic loading path at all. Consider adding at least a smoke test that installs torch and verifies CUDA stub resolution succeeds.

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@src/target/stubs/cudart.cc`:
- Around line 76-79: The current error path uses __builtin_unreachable() which
is only an optimization hint and doesn't guarantee termination; update the error
handling after the fprintf in the TileLang libcudart check to call abort() (from
stdlib.h) before the __builtin_unreachable() (or replace the builtin call with
abort()) so the process deterministically terminates and the error message is
visible; target the fprintf(...) statement and the following
__builtin_unreachable() call in cudart.cc when making this change.

src/target/stubs/cudart.cc

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

src/target/stubs/cudart.cc (1)

14-16: ⚠️ Potential issue | 🟡 Minor

Stale doc comment — dlopen() is no longer used.

Lines 15–16 say "The real libcudart is loaded lazily via dlopen() on first API call, and symbols are resolved via dlsym()." After this PR, dlopen is removed entirely; symbols are found via dlsym(RTLD_DEFAULT/RTLD_NEXT, …). Update the header to reflect the new strategy.

src/target/stubs/nvrtc.cc (1)

13-14: ⚠️ Potential issue | 🟡 Minor

Stale doc comment — dlopen() is no longer used.

Lines 13–14 reference lazy loading via dlopen(), but this PR replaces that with dlsym(RTLD_DEFAULT/RTLD_NEXT, …). Update accordingly.

🤖 Fix all issues with AI agents

In `@src/target/stubs/cudart.cc`:
- Around line 66-85: TryLoadLibCudart can return RTLD_DEFAULT (which is
((void*)0) on glibc) or RTLD_NEXT, so the nullptr check in CreateCUDARuntimeAPI
is invalid and causes a spurious "missing library" path; modify the code so
CreateCUDARuntimeAPI does not treat a returned RTLD_DEFAULT as failure — either
remove the handle==nullptr guard entirely or replace it with an explicit check
that treats RTLD_DEFAULT and RTLD_NEXT as valid sentinel handles (e.g., compare
against RTLD_DEFAULT/RTLD_NEXT or use a dedicated bool flag returned by
TryLoadLibCudart); update the logic around TryLoadLibCudart,
CreateCUDARuntimeAPI, and any callers of CUDARuntimeAPI/MissingLibraryError to
rely on that explicit sentinel instead of nullptr.

In `@src/target/stubs/nvrtc.cc`:
- Around line 42-60: The check in CreateNVRTCAPI that treats a nullptr handle as
a failure must be removed because TryLoadLibNvrtc() aborts on error and may
legitimately return RTLD_DEFAULT which is (void*)0 on glibc; update
CreateNVRTCAPI to accept the handle returned by TryLoadLibNvrtc() (including
RTLD_DEFAULT) and proceed to load nvrtc symbols (references: TryLoadLibNvrtc,
CreateNVRTCAPI, NVRTCAPI, nvrtcVersion, RTLD_DEFAULT, RTLD_NEXT) instead of
returning an empty NVRTCAPI when handle == nullptr.

src/target/stubs/cudart.cc

src/target/stubs/nvrtc.cc

…. This change simplifies the API initialization process in both cudart.cc and nvrtc.cc, ensuring that the function pointer lookups proceed without early returns.

…zy loading mechanism. Update documentation to clarify the purpose and implementation of stubs for better understanding.

oraluben added 2 commits February 9, 2026 17:02

use nvrtc/cudart come with torch

617f25b

remove cuda 13 job

0df5a83

oraluben requested review from LeiWang1999 and XuehaiPan February 9, 2026 09:47

XuehaiPan marked this pull request as draft February 9, 2026 09:51

coderabbitai bot reviewed Feb 9, 2026

View reviewed changes

oraluben marked this pull request as ready for review February 9, 2026 09:55

oraluben added 2 commits February 9, 2026 18:33

add assert

6a81fdc

update error message

baf67ac

coderabbitai bot reviewed Feb 9, 2026

View reviewed changes

src/target/stubs/cudart.cc Outdated Show resolved Hide resolved

resolve issues

3ba309b

coderabbitai bot reviewed Feb 9, 2026

View reviewed changes

src/target/stubs/cudart.cc Show resolved Hide resolved

src/target/stubs/nvrtc.cc Show resolved Hide resolved

oraluben changed the title ~~[Dist] Use one wheel for different CUDA version~~ [Dist][Release] Use one wheel for different CUDA version Feb 9, 2026

vibe doc

128b8c9

oraluben marked this pull request as draft February 9, 2026 15:30

oraluben and others added 2 commits February 9, 2026 23:32

Refactor CUDART and NVRTC API creation by removing null handle checks…

342de95

…. This change simplifies the API initialization process in both cudart.cc and nvrtc.cc, ensuring that the function pointer lookups proceed without early returns.

lint

ea6e350

oraluben marked this pull request as ready for review February 10, 2026 01:50

oraluben and others added 10 commits February 10, 2026 10:02

Merge branch 'main' into update-nvruntime-stubs

1b44c67

add env back

b4b9e6c

Add test against nightly torch

b1c7925

fix

afe0d42

test with other torch

29c74fc

fix

60d4f6f

fix

51e3975

fix

74e8e57

Print all version check at end

e41ae91

fix tvm

b8e0c3a

oraluben added 2 commits February 10, 2026 14:22

Refactor CUDA and NVRTC stubs to improve version compatibility and la…

9dec36b

…zy loading mechanism. Update documentation to clarify the purpose and implementation of stubs for better understanding.

lint

e4280b0

LeiWang1999 approved these changes Feb 10, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Dist][Release] Use one wheel for different CUDA version#1826

[Dist][Release] Use one wheel for different CUDA version#1826
oraluben wants to merge 20 commits intotile-ai:mainfrom
oraluben:update-nvruntime-stubs

oraluben commented Feb 9, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

github-actions bot commented Feb 9, 2026

Uh oh!

coderabbitai bot commented Feb 9, 2026 •

edited

Loading

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

oraluben commented Feb 9, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

github-actions bot commented Feb 9, 2026

Uh oh!

coderabbitai bot commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

oraluben commented Feb 9, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 9, 2026 •

edited

Loading